Download Latest Version sphinx4-5prealpha-src.zip (41.3 MB)
Email in envelope

Get an email when there's a new version of CMU Sphinx

Name Modified Size InfoDownloads / Week
Parent folder
CIEMPIESS_Spanish_Models_581h.zip 2019-08-24 159.6 MB
README.txt 2019-08-23 4.0 kB
LICENSE.txt 2019-08-23 35.1 kB
Totals: 3 Items   159.6 MB 16
-------------------------------------------------------------------------------------------------
                                The CIEMPIESS Spanish Models
             PocketSphinx Acoustic Models in Spanish made out of 581 hours of audio
                            by Dr. Carlos Daniel Hernández Mena
-------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------
PRESENTATION
-------------------------------------------------------------------------------------------------

The CIEMPIESS Spanish Models are acoustic models designed to work with PocketSphinx. The 581 
hours of audio recordings used to train the models come from many datasets by LDC (including 
all the CIEMPIESS corpus except the CIEMPIESS-TEST) and other sources collected by the social 
service program "Desarrollo de Tecnologías del Habla" and the CIEMPIESS-UNAM project. Both of 
them belonging to the "Univeridad Nacional Autónoma de México" (UNAM) in Mexico City.

-------------------------------------------------------------------------------------------------
MODEL CHARACTERISTICS
-------------------------------------------------------------------------------------------------

- Most the audio files used in the training stage contain clean speech. The training corpus
  mixes read and spontaneous speech in many accents of Spanish including accents from Mexico, 
  Spain and Latin America.

- The acoustic models are Continuous and Context Dependent (CD). 10,000 senones were used for 
  its creation

- The audio format of the training files is Microsoft WAV 16Khz@16bit mono.

- The pronouncing dictionary contains more than 285,000 words.

- The phonetic alphabet used in the pronouncing dictionary is called Mexbet. For more 
  informatioin about Mexbet see www.ciempiess.org 

- The phonetic transcriptions used in the pronouncing dictionary were made using a G2P-tool 
  called "fonetica3 library". For more information see www.ciempiess.org

- The text used for language model come from many sources including Wikipedia, trascribed
  interviews and newspapers.

- The language model was created using SRILM.

-------------------------------------------------------------------------------------------------
TERMS OF USE
-------------------------------------------------------------------------------------------------

The CIEMPIESS Spanish Models by Carlos Daniel Hernández Mena are free software; you can 
redistribute it and/or modify it under the terms of the GNU General Public License as 
published by the Free Software Foundation; either version 3 of the License, or (at your option) 
any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; 
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  
See the GNU General Public License for more details.


The CIEMPIESS Spanish Models were created by May, 2019.

-------------------------------------------------------------------------------------------------
ACKNOWLEDGEMENTS
-------------------------------------------------------------------------------------------------

The author would like to thank to Alejandro V. Mena, Elena Vera and Angélica Gutiérrez for their 
support to the social service program: "Desarrollo de Tecnologías del Habla." They also thank 
to the social service students for all the hard work.

-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------

For more information and documentation see the CIEMPIESS-UNAM Project website at:

		             http://www.ciempiess.org/

-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------

Source: README.txt, updated 2019-08-23